Comparison of ML, MAP, and VB based acoustic models in large vocabulary speech recognition

نویسنده

Panu Somervuo

چکیده

The present work compares three different methods for training acoustic models in a Finnish large vocabulary speech recognition system. The models are trained using the maximum likelihood (ML), maximum a posteriori (MAP), and variational Bayesian (VB) principle. The results show that when the model complexity is properly chosen, all three methods give similar performance. As the model complexity increases, the performance of ML based system starts to degrade whereas no overfitting is observed using MAP and VB based models. MAP gives slightly better recognition accuracy over VB but it cannot be used for model selection without auxiliary data. The advantage of VB is that it can be used for selecting a well performing model structure using only training data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Automatic Generation of Non-uniform and Context-Dependent HMMs Based on the Variational Bayesian Approach

We propose a new method both for automatically creating non-uniform, context-dependent HMM topologies, and selecting the number of mixture components based on the Variational Bayesian (VB) approach. Although the Maximum Likelihood (ML) criterion is generally used to create HMM topologies, it has an over-fitting problem. Recently, to avoid this problem, the VB approach has been applied to create...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Comparison of ML, MAP, and VB based acoustic models in large vocabulary speech recognition

نویسنده

چکیده

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Automatic Generation of Non-uniform and Context-Dependent HMMs Based on the Variational Bayesian Approach

عنوان ژورنال:

اشتراک گذاری